spark2.1操作json(save/read)
建筑物配置信息:
case class BuildingConfig(buildingid: String, building_height: Long, gridcount: Long, gis_display_name: String, wear_loss: Double, path_loss: Double) extends Serializable
向hdfs写入json文件:
sql( s"""|select buildingid, |height, |gridcount, |collect_list(gis_display_name)[0] as gis_display_name, |avg(wear_loss) as wear_loss, |avg(path_loss) as path_loss |from |xxx |""".stripMargin) .map(s => BuildingConfig(s.getAs[String]("buildingid"), s.getAs[Int]("height"), s.getAs[Long]("gridcount"), s.getAs[String]("gis_display_name"), s.getAs[Double]("wear_loss"), s.getAs[Double]("path_loss"))) .toDF.write.format("org.apache.spark.sql.json").mode(SaveMode.Overwrite).save(s"/user/my/buidlingconfigjson/${p_city}")
从hdfs中读取json文件:
/** * scala> buildingConfig.printSchema * root * |-- building_height: long (nullable = true) * |-- buildingid: string (nullable = true) * |-- gis_display_name: string (nullable = true) * |-- gridcount: long (nullable = true) * |-- path_loss: double (nullable = true) * |-- wear_loss: double (nullable = true) **/ spark.read.json(s"/user/my/buildingconfigjson/${p_city}") .map(s => BuildingConfig(s.getAs[String]("buildingid"), s.getAs[Long]("building_height"), s.getAs[Long]("gridcount"), s.getAs[String]("gis_display_name"), s.getAs[Double]("wear_loss"), s.getAs[Double]("path_loss"))) .createOrReplaceTempView("building_scene_config")
基础才是编程人员应该深入研究的问题,比如:
1)List/Set/Map内部组成原理|区别
2)mysql索引存储结构&如何调优/b-tree特点、计算复杂度及影响复杂度的因素。。。
3)JVM运行组成与原理及调优
4)Java类加载器运行原理
5)Java中GC过程原理|使用的回收算法原理
6)Redis中hash一致性实现及与hash其他区别
7)Java多线程、线程池开发、管理Lock与Synchroined区别
8)Spring IOC/AOP 原理;加载过程的。。。
【+加关注】。